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5 Abstract 

Q-i Consider independent observations (Xi, Ri), {X2, R2), ■ ■ ■ , {Xn, Rn) with random or 

'*^ fixed ranks Ri G {1, 2, . . . , m}, while conditional on Ri = k, the random variable Xi has 

\^ the same distribution as the k-th order statistic within a random sample of size m from an 

^ unknown continuous distribution function F. Such observation schemes are utilized in situ- 

r— I ations in which ranking observations is much easier than obtaining their precise values. Two 

W wellknown special cases are ranked set sampling (Mclntyre 1952) with m = n and Ri = i, 

^ and judgement post-stratification (MacEachern et al. 2004) with Ri ~ Unif ({1, 2, . . . , m}). 

One goal is to compute pointwise confidence intervals for the distribution function F 
with guaranteed coverage probability for finite sample sizes. We propose a solution for 
this tasks which is based on the conditional distribution of the naive empirical distribution 
function, given the ranks Ri, R2, . . . , Rn- This procedure motivates a new estimator for the 
j>. whole distribution function F. Within the setting of judgement post-stratification we ana- 

O lyze and compare the asymptotic distribution of the new estimator, the stratified estimator 

pl^ of Stokes and Sager (1988) and the nonparametric maximum-likelihood estimator of Kvam 

\^ and Samaniego (1994). It turns out that the former two estimators are asymptotically equiv- 



alent, and that the latter estimator is asymptotically more efficient, although the efficiency 



O gain is rather small. 



1 Introduction 

Ranked set sampling and judgement post-stratification are both sampling strategies in 
situations in which ranking several observations is possible and relatively easy without 
referring to exact values whereas obtaining complete observations is much more involved. 
For instance, this occurs often in agriculture or forestry when the quantities of interest are 
yields on different plots or of different trees. 
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The general observation scheme is as follows. Let Xij, l<i<n, l<j<m, 
be independent random variables with unknown continuous distribution function F. That 
means, we consider n independent random samples from F of size m each. Instead of the 
whole i-th sample (Xjj)^^ we observe only one of its elements, denoted by Xj, and its 
rank R, = ^7=1 H^ij < X,} G {1, 2, . . . , m}. 

In ranked set sampling (RSS), as introduced by Mclntyre (1952), we pick a fixed 
number Ri E {1,2,..., m}. Then we obtain the value Xi := Xi.r-, where Xi,i < Xi,2 < 
■ ■ ■ < Xi;m are the order statistics of {Xij)Y=i- In the simplest case, m = n and Ri = i. 
In case of m << n one often tries to achieve a balanced sample in the sense that the 
numbers 

n 
i=l 

are essentially identical. 

Judgement post-stratification (IPS), as introduced by MacEachem et al. (2004), means 
that from the i-th sample we obtain only its first element Xj := Xn and its random rank 
Ri = Xlfci H^ij — ^ii}- Here the numbers Nnk are random variables with binomial 
distribution Bin(n, 1/m). The whole vector {Nnk)'k=i follows a multinomial distribution 

Mult(n; l/m, . . . , 1/m). 

Both RSS and IPS lead to independent random variables (Xi,i?i), (X2,R2), ■•-, 
(Xn, Rn) with fixed or random ranks Ri G {l,2,...,m}. Conditional on i?j = k, the 
random variable Xi has distribution function Fk given by 

Fk{x) := P(X, <x\R, = k) = Bk{F{x)), 

where Bk : [0, 1] — )■ [0, 1] denotes the beta distribution function with parameters k and 
m + 1 — k. Thus forp G [0, 1], 

Bkip) := E(7)pXl-Pr-^ = iyk{u)du 

with 

Pk{u) := Cku'-\l-ur-' and C, := ^(^Ii) = "^{Z-l 



A good overview of existing literature about analyzing ranked set samples is given by 
Wolfe (2012). Several estimators of the c.d.f. F have been proposed. Of course one could 
just ignore the rank information and compute the empirical c.d.f. F„, 



1 "" 
Fn{x) := -5^1{X, <x}. 



n 

1=1 

In the JPS setting this estimator is unbiased and -\/n-consistent. However, the stratified 
estimator 



Fn ■- — / ^Fnk 
k=l 



with the empirical c.d.f. 

within stratum Jnk := {i : -Rj = /c} is usually more efficient. It has been introduced and 
analyzed in a balanced RSS setting by Stokes and Sager (1988). Refinements and modi- 
fications of this estimator F^ in the JPS setting have been proposed by Frey and Ozturk 
(2011) and Wang et al. (2012). In particular, these authors consider situations with small 
or moderate sample sizes so that some strata Jnk may be empty or the empirical c.d.f.s 
Fnk may fail to satisfy order relations which are known for their theoretical counterparts 
Fk. 

Another approach to estimating the c.d.f. F which can also handle empty strata was 
introduced by Kvam and Samaniego (1994). They propose to estimate F[x) by maximiz- 
ing a conditional log-likelihood function. The resulting estimator F^ is given by 

F^(x) := argmaxL„(a;,p) 

pe[o,i] 

with the conditional log-likelihood function 

n 

L^{x,p) := Y,[l{X,<x]\ogBRXp) + ^{Xr> x}\og{l -BrXp))] 

i=l 

m 

= Y,Nnk[Fnk{x) log Bk{p) + (1 - Fnk{x)) log(l - Bk{p))] 
k=l 

of the indicator vector (l{Xj < a;})"^^, given the rank vector R^ = (i?j)"^^. 
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A possible approach which we haven't seen in the literature is to estimate F by a 
moment equality for the naive empirical c.d.f. F„. Note that 

m 

W.{nFn{x)\Rn) = Y,^nkBk{F{x)). 

k=l 

Hence we propose to estimate F{x) by the unique number F^{x) E [0, 1] such that 

m 

nFnix) = J2^nkBk{F^{x)). 

k=l 

The latter estimator F^ is closely related to exact tests of null hypotheses about F{x) 
for arbitrary fixed numbers x G M. These tests and the resulting confidence bounds for 
F{x) are described in detail in Section [2| 

In Section p^ we present some elementary properties of the estimators F^, F^ and 
F^ and comment briefly on the computation of the latter two. In particular we describe 
a simple method to compute the estimator F^ which is different from the proposals by 
Kvam and Samaniego (1994). In addition we describe confidence bands for the whole 
distribution function F. 

Section |4] provides a detailed analysis of the asymptotic distribution of the estimators 
F^, F^ and F^ as n — t- oo while m is fixed and Nnk/n — t-^ 1/m for 1 < A; < m. 
The two most important findings are that (i) the estimators F^ and F^ are asymptotically 
equivalent and (ii) the estimator F^ is asymptotically more efficient than the other two, 
although the gain in efficiency is rather small. 

All proofs are deferred to Section [5] Finally, in Section |6] we mention some possible 
extensions. 

2 Exact pointwise inference 

From now on we condition on the rank vector i?„. In particular, the vector Nn = 
{Nnk)T=i of Stratum sizes is viewed as a fixed vector, and all probabilities and expec- 
tations refer to the conditional distribution of X„ = (Xj)"^^^, given Rn. 

Note that the distribution of nF„(x) depends only on A^„ and F(x). Precisely, in case 
of F(x) = p, it has the same distribution as XlfcLi ^fc,p with independent random variables 



Yi,p, F2,p, • • • , Ym,p, where 

Yk,p ~ Bm{Nnk,Bk{p)). 

Let G'Ar„,p be the corresponding distribution function, i.e. 

m 
k=l 

This is not a standard distribution function but can be computed numerically quite easily. 
Now an exact (conservative) p-value for the null hypothesis "F(x) > p" is given by 

GN„,p{nFn{x)). 

Likewise, a p-value for the null hypothesis "F(a;) < p" is given by 

1 - G'Ar„,p(nF„(x) - 1). 

These p- values imply two different (1 — a)-confidence regions for F{x), namely, 

{pe [0,1] ■.GN„^p{nFn{x))> a} or {p e [0,1] : GN„,p{nF^{x) - 1) < 1 - a}. 

Elementary considerations reveal that for any y E {0,l,...,n — 1}, the distribution 
function GN^^p{y) is continuous and strictly decreasing in p G [0, 1] with boundary values 
GNr^fliy) = 1 and GN^,i{y) = 0. Moreover, Gjv^.pl^) = 1 and Gn„^p{-1) = for 
all p E [0, 1]. Consequently, the (1 — Q;)-confidence regions above lead to one-sided 
(1 — a) -confidence bounds: 

{p E [0, 1] : GN„^p{nK{x)) > a} = [0,b^{N^,nK{x))], 
{pE[0,l]:GN„AnK{x)-l)<l-a} = [a„(Ar„, nF„(a;)), l] . 

Here ba{Nn,y) is the unique solution p E (0, 1) of the equation Gj\[„,p{y) = a if y E 
{0, 1, ... ,n — 1}, and ba{Nn,n) = 1. Likewise, aa{Nn,y) is the unique solution p E 
(0, 1) of the equation GN„,piy — 1) = 1 — a if y E {1,2,..., n}, and aa{Nn, 0) = 0. 
Finding the solution p E (0, 1) of an equation such as GN^^p{y) = u is possible via a 
bisection algorithm. 

Obviously one can combine lower and upper bounds and compute the (1 — a) -confid- 
ence interval [aa/2iNn,nFn{x)),ba/2iNn,nFnix))] forF{x). To compute these inter- 
vals for all a; G M, we only have to compute the n upper bounds 6a/2(-/Vn, y),0 < y < n. 



and the n lower bounds aa/2{Nn,y), I < y < n. Note also that aa/2{Nn,y) < 
ba/2{Nn, y) for a G (0, 1) and < y < ra. This is obvious in case of y being or n. In 
case of 1 < y < n it follows from the fact that a = aa/2{Nn,y) and b = ba/2{Nn,y) 
satisfy the (in)equalities GNr^biv) = a/2 < 1 - a/2 = Gn^^Av - 1) < GNnAv)- 

If we would ignore the ranks Ri and just pretend that Xi, X2, . . . , X„ are i.i.d. with 
distribution function F, then we would work with the distribution function Gn.p of the 
binomial distribution Bin(n,p) instead of Gn^^p- This would lead to standard confidence 
bounds a^*(ra, nFn{x)), b^^{n, nFn{x)) and a standard confidence interval with endpoints 
a^*/2(^' nFn{x)), &^*/2(n, nF„(a;)) for F{x). 

Numerical examples. Figures [T] and [2] show for n = 50 and a = 5% the boundaries 

^a^l'^'l/)' ^a/2(^'^)' cia/2{.Nn,y) and ba/2{.Nn,y) in two different settings. The hori- 
zontal axis corresponds to the potential values of nF^. Thus for y = 1, 2, . . . , n + 1, the 
depicted boundaries on \y — 1, y] correspond to the resulting boundaries on [X(j^_i), X(y)], 
where —00 = X(o) < X(i) < X{2) < ■ ■ ■ < X(„) < X(^n+i) = 00 are the order statistics 
of Xi,X2, . . . ,Xn. 

Figure [T]is for the classical RSS setting with m = n = 50 and A^„ = (1, 1, ... , 1)^. 
The innermost step function corresponds to y/n, i.e. F„, the two outermost step func- 
tions represent a^^,2{n,y) and W^^i^^y), and the remaining two thick lines represent 
0-0/2(^1%, y) and ba/2{Nn,y). Obviously the additional information contained in R^ in- 
creases the precision substantially. 

Figure [2] shows these boundaries in case of m = 4 and A^„ = (20, 15, 10, 5)^. Here 
ignoring the rank information and pretending the Xi to be i.i.d. would induce a substantial 
bias and lead to conditional probabilities smaller than 95%. 

3 Computation and basic properties of the estimators 

Computations. While the computation of the stratified estimator F^ is straightforward, 
the estimators F^ and F^ may be computed numerically by running a suitable bisection 
algorithm n — 1 times. 
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Figure 1: Pointwise 95% -confidence band for F in tlie RSS setting with n = 30. 



For F^ this is rather obvious. Note that X^IJLi NnkBk{p) is continuous and strictly 
increasing in p G [0, 1] with boundary values and 1. Hence for x < X(i), we ob- 
tain F^{x) = = Fn{x), and for x > X(„), we get F^\x) = 1 = F„(x). For 
X(y) < X < X(y^i) with 1 < y < n, the estimator Fl^{x) is the unique solution p of 

TJk=i^nkBk{p) = y- 

As to F^, note first that for any fixed a; G M, the log-likelihood function Ln{x,-) : 
[0, 1] — 7> [— oo, oo) is continuous and differentiable on (0, 1) with derivative 



d 



L'nix,p) := ^Lnix,p) = ^N^k 



dp 



k=l 



A 



Pk 



^ {p)Fnk{x)- ^_^ 



{p){l - F^k{x)) 



Note also that 



I3k_ 
Bk 



(p) 






k~i 
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Figure 2: Pointwise 95%-confidence bands for F in case of A^„ = (20, 15, 10, 5)^. 



and 






P 



l+l-k. 



l-pf 



are strictly decreasing and strictly increasing in p G (0, 1), respectively. Consequently, 
the derivative L'^{x, ■) is continuous and strictly decreasing on (0, 1). Hence L„(a;, ■) is 
continuous and strictly concave on [0,1]. In particular, the estimator -F^(x) is well-defined 

for any x G M. 

For explicit calculations we rewrite the derivative L'„(a;, p) as follows: 

m 

L'^{x,p) = y^ NnkWkjp) \Fnk{x) - Bkjp)] 
with the auxiliary function 



fc=i 



Wk{p) 



0k 



ip) 



Pk{p) 



Bk{l-Bk)'^' Bk{p)B^+i^k{l - pY 
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The latter equation uses the relation 1 — Bk{p) = Bm+i-ki^ — p) and is highly recom- 
mended to avoid rounding errors in case of p being close to 1. Note also that 

(k + oil) 
^— as p i 0, 
m + l- k + o(l) 
— asptl- 
1 - J9 

This implies that 

, I +00 if F„fc(x) > for atleastone fc, 

limL„(x,p) = < - ^ ^ 

Pio "^ "^^ |o ifF„fc(a;) = OforallA;, 

, ^, . . I — oo if F„fc(a;) < 1 for atleastone /c, 

Pti "^ ^ [0 ifF„fc(a;) = IforalU. 

Consequently, if a; < X(i), then L'^{x, •) < on (0, 1), so F^{x) = 0. Likewise, if 
X > X(n), then L'„(a;, •) > on (0, 1), whence Fl^{x) = 1. For y = 1, 2, ... ,n — 1, 
all functions Fnk{x) are constant in x G [X(j^), X(j,_|_i)), and there the estimator -F^(a;) is 
the unique solution p G (0, 1) of the equation L'^{X(y),p) = 0. Again this may be found 
numerically via bisection. 

Basic distributional properties. All estimators F„, F^, F^ and F^ are distribution- 
free in the following sense: Let Bn, B^, B^^ and 5^ be defined analogously with raw 
observations from the uniform distribution on [0, 1]. That means, we replace the random 
variables Xi, X2, . . . , X„ with random variables Xi, X25 • • • , -^n ^ [0, 1] which are inde- 
pendent, and Xi has distribution function B^ if Ri = k. (Recall that we condition on i?„.) 
Then 

(^n(a^))^6R has the same distribution as {B^{F{x)))^^^, 
where Z = S, M, L. Hence it suffices to analyze the distribution of the random processes 

For instance, we may compute Kolmogorov-Smimov confidence bands for the un- 
known distribution function F as follows: Let k'^ = K^^Nn-, a) be the (1 — a)-quantile of 
the random variable sup^grg^i |-B^(t) — t|. Then we may conclude with confidence \ — a 
that 

F{x) G [F^{x) ± K^] for all x e^. 

9 



The quantiles n^^Nn, a) may be estimated via Monte-Carlo simulations. 

4 Asymptotic expansions and limiting distributions 

A mentioned in the introduction, we consider the asymptotic behavior of the estimators 

B^, B^ and Si' as n — )■ oo while m is fixed and 



— — — 7- — for 1 < fc < m. 

n m 

Let us start with some heuristic considerations to identify the limiting distributions: For 

1 < A; < m let 

V„fc := ^Kk{Bnk - Bk) o B~\ 

It follows from Donsker's theorem for the empirical process that V„fc behaves asymptot- 
ically like a standard Brownian bridge process V = (V(M))„g[o,i]; see also Proposition [4] 
in SectionlSl Thus sup(£[o ^j |V„fc(t)| = Op(l) and 



n{Bl-B) = -f^^^YnkoBk ^ -^f^YnkoB, 



k=\ fc=l 

n 
m 



k=l V n/c V ^.^^ 

where B{t) := t = rxT^ Y2=\ ^k{t) for t e [0, 1]. 
As to the estimator B^, we write 

m m 

nBn{t)-Y,NnkBk{p) = ^Nnk{Bnk{t) - Bkit) + Bkit) - Bk{j>)) 

1 

m 

J2{Bnk{t)-Bk{t)~f3k{t){p-t)), 

k=l 

and the right hand side equals if, and only if, 

P = t+-y2iBnk-Bk){t) = B^it). 

k=\ 

Here we utilized the fact that Y^=\ l^k = "m- Hence we expect that 

MB^-B) ^ ^{Bl-B). 
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k=l 

n 
m 



Finally, we write 

m 

L'^{t,p) = 5^Ar„fcti;fc(p)[5„fc(t)-5fc(p)] 
1 

m 

■ Y^ Wk{t) [B^k{t) - Bk{t) - h{t){p - t)] 
fc=i 

fc=i ^ 

This approximation to L'^{t,p) equals if, and only if, 

1 — m m 

P = t + ^=Ylk{t)ynk{Bk{t)) with 7fc := Wfc/^w^/?^. 



m 



k=i e=i 

Hence we expect that 



m 



n{Bl-B) ^ V^J^ik^nkoBk (2) 



fc=i 



on (0, 1). Recall the expansion ([T]) of Wk{t) for t close to or 1. Together with /3k{0) 
l{k = l}m and /3fc(l) = 1{^ = rn}m this implies that 



k 
7fc(0) :=lim7fc(t) = — , 
40 m 

7fc(l) :=lim7fc(t) = . 

m m 



Hence the stochastic process on the right hand side of ([2]) may be defined on [0,1]. 
Theorem 1. Under the conditions of this section, for Z = S, M, L, 



|v^(gg(t)-t)- V^(t) 



sup ^jj-^ -:^ — J-p U, 



where 



and 



Moreover, 



tM 



Yt := V 

/m 

k=l 



-, m 

r= y^ ^^k o Bk 



m 



y\ ■= v^^7fcV„fco5fc. 



fc=i 



|V^(t)| 
sup ^ " — r^ — J-p as ra — )> oo and c | 0. 
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This theorem shows that all three estimators F^, F^, F^ are root-n-consistent and that 
the estimators F^ and F^ are even asymptotically equivalent. The next theorem specifies 
their limiting distribution and shows that F^ is asymptotically more efficient than F^ and 

^n ■ 

Theorem 2. For Z = S, M, L, the stochastic process V^ converges in distribution in the 
space iooi[0, 1]) to a centered Gaussian process V^ with continuous paths on [0, 1]. Let 
K^ be the covariance function of Y^ , i.e. K^{s,t) ■.= 'E(y^{s)Y^{t)), andletK{s,t) : = 
min{s,t} — st. Then 

^ m ^ m 

K\s,t) = K^{s,t) = -y^K{Bk{s),Bk{t)) = mm{s,t}--y^Bk{s)Bk{t), 

fc=i fc=i 

whereas 

m 

K'^{s,t) = mY,lk{shk{t)K{Bk{s),Bkit)). 

k=l 

Moreover, forO < t < 1, 

K^it.t) < K^{t,t) = K^{t,t) < K{t,t), 

where the first inequality is strict unless t = 1/2 and m = 2. 

Note that K(-,-) is the covariance function of the standard Brownian bridge V, which 
corresponds to the limiting distribution of y/n{Fn — F) in case of simple random samples. 
Hence all three estimators F^, F^ and F^ are asymptotically more efficient than the naive 
estimator F„. 

Figures [3] and |4] illustrate Theorem [2] for m = 4 and m = 10, respectively. On the 
left hand side the three variances K{t,t), K^(t,t) = K^{t,t) and K^(t,t) are plotted 
for < t < 1. On the right hand side one sees the relative asymptotic efficiencies, 
K^{t,t)/K^{t,t), < t < 1. The latter plot confirms that the efficiency gain of F^ 
versus F^, F^ is strictly positive on {0 < F < 1}, but it is rather small. Therefore we 
did not try to construct pointwise confidence intervals based on log-likelihoods but prefer 
the simpler procedure developed in Section |2] 

Our final result in this section shows that the estimators 13^ and 5^ are asymptotically 
equivalent in the tail regions: 
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Figure 3: Asymptotic variances and relative efficiencies for m = 4. 
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Figure 4: Asymptotic variances and relative efficiencies for m = 10. 
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Theorem 3. For any Gxed k e [1/2, 1) and Z = M, L, 



and 



sup t--\^{B^it)-t)-Y^^\t)\ ^, 
te(o,c] 



sup (l-tr'^\^(B^(t)-t)-Y^-\t)\ ^, 
te[i-c,i) 



as n — )■ oo and c | 0, where 



m^JNnl/n m^jNnm/n 

5 Proofs 

We first recall two well-known facts about uniform empirical processes, see Shorack and 
Wellner(1986). 

Proposition 4. Let f/i, f/2, f/3, . . . be independent random variables with uniform distri- 
bution on [0,1]. ForN eNandu e [0,1] defineYNiu) := N"^/"^ J2f=i{HUi < u}-u). 
Then, as N —^ 00, Yn converges in distribution in £00 ([0, 1]) to a standard Brownian 
bridge Y on [0, 1]. Moreover, for any fixed 6 E (0, 1/2), 

sup —J- -T — )>„ as N -^ 00 and c 10. 

Me(o,c]u[i-c,i) M (1 -uy 

For the estimators F^^, F^ we need some basic facts and inequalities for the auxiliary 
functions Wk and Bk'. 

Lemma 5. (a) For k = 1,2, ... ,m, the function Wk on (0, 1) may be written as Wk{t) = 
Wk{t)/{t{\ — t)) withwk : [0, 1] — )> (0, 00) continuously differentiable. In particular, there 
exist constants Cu, = Cw{m) and C^ = Cw{rn) in (0, 00) such that for k = 1,2, ... ,m 
andte (0,1), 

Cw ^ j,\ , ^w 



t(l-t) - ''' - t(l-t)- 

(b) For any constant c G (0, 1) there exists a number c' = c'{m, c) > with the following 
property: Ift,p G (0, 1) such that 

\p-t\ 



t(l-t) 
14 



< c. 



then for k = 1,2, . . . ,m, 

Wk{p) 



max 



Wk{t) 



1 



Bk{p) - Bkjt) 
Pk{t){p-t) 



1 



< d 



, \p-t\ 
t{l-t)- 



U 



Proof of Lemma m As to part (a), note that Wk is a rational and strictly positive func- 
tion on (0,1). Hence Wk{t) := t(l — t)wk{t) defines a function with these proper- 
ties, too. Moreover, limtio Wk{t) = k and \im.t^iWk{t) = m + 1 — k. Hence Wk 
may be viewed as a rational and strictly positive function on a neighborhood of [0, 1]. 
In particular, Wk is continuously differentiable on [0, 1] with values in [ct„,Cu,], where 
c^ := mmi<k<m,t(^[o,i] Wkit) > and C^ := maxi<fc<„^jg[o,i] Wk(t) < oo. 

For proving part (b), note first that |p — t| < ct(l — t) implies the inequalities p < 

(1 + c)t and I — p < (1 + c)(l — t). Moreover, since |p(l — p) — t(l — t)| < \p — t\, we 
may conclude that]9(l — p) > (l — c)t(l — t). Consequently, 



Wfc(p) 



Wk{t) 



< 



< 



< 



\wk{p)t{l - t) - Wk{t)p{l - p) I 

Wfe(t)p(l -p) 

\wkip) - Wkit)\t{l - t) + Wkit)\til - t) - pjl - p)\ 
Wk(t)p{l -p) 

\wk{p) -Wk{t)\/A + Cw\t-p\ 



c^(l-c)t(l-t) 
c'^/A + C^ \p-t\ 



c^(l-c) t(l-t)' 
where c'^ := maxi<fc<mug[o,i] |tt;^('u)|. Moreover, for min(t,p) < ^ < max(t,p). 



\m)\ 



\k 



{m-m 



< 



m — 1 



/3fc(0 e(l-0 " (l-c)t(l-t) 

Hence Taylor's formula implies that for a suitable such ^, 



.„, ||<(i..r- 



Bk{p)-Bk{t) 



Pk{t){p-t) 



1 



mowp-ti / (m-i)(i+cr-i b-t| 



2/3fc(t) 



< 



t(l-t)' 



n 



Proof of Theorem [TJ Note first that for n > 1 and I < k < rn, the empirical process 
Ynk is distributed as Va^,^^ in Proposition [4l Hence, 

\'Vnk{u)\ 



sup 



i<k<m,ue{o,i)U^{^ -uY 

\Ynk{u)\ 



Op(l), 



sup 



l<A:<m,Me(0,c]U[l-c,l) ^ (1 — m) 



— j-p as n — )> oo and c | 0. 
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Note also that for I < k < m, 

Bk{t) < Bi{t) < mt and I - Bk{t) < 1 - 5„(t) < m(l-t), 



so 

Bk{t){l - Bkjt)) 

t(l-t) 
Hence we may conclude that for Z = S, M, L, 

|V^(^)I 



< m. 



sup J,/' :' , = Op(l), (3) 



sup " ' -J-p as n -;■ oo and c | 0. (4) 

te(o,c]u[i-c,i) ^ (1 -i) 

Since maxjt \Nnk/n — l/m\ — )• 0, these conclusions ([3]) and (|4]) remain true if we replace 
Yl{t) with 



fc=i 



where 



7nfc(^) := V^/Nnk, 



^^,it) := ^^^/5^iV„,/3,(t), 



and 



|v^(t)-v^(t)| 

It remains to be shown that for Z = S, M, L, the process y/n{B'^^ — B) may be approx- 
imated by V^. In case of Z = S we even have the equality ^/n{Bf^ — B) = V^. 

For Z = M, L it suffices to show that for any fixed number 6 7^ and 
the following statements are true: If 6 < 0, then with asymptotic probability one, 

m 

inf (nBn{t) - V NnkBk{Pn{t)) 
16 



If 6 > 0, then with asympototic probability one, 



sup (riBnit) -S^ NnkBk{,Pnif)) 

ie(o,i)^ "-^ 



k=l 



> < 0. 



sup L'^{t,pn{t)) 

te(o,i) 



(6) 



Here we use the conventions that L'^(t,-) := oo and 5^ := on (— oo, 0] while L'^{t,-) : = 
— oo and Bk := 1 on [1, oo). 

To verify theseclaims, we split the interval (0, 1) into (0, c„], [c„, 1 — c„] and [1 — c„, 1) 
with numbers c„ G (0, 1/2) to be specified later such that c„ I 0. 

On [c„, 1 — Cn] we utilize Lemma |5| For t G [c„, 1 — t„] and p G (0, 1) such that 
\p — t\ < t(l — t)/2 we may write 

m 

nBn{t) -^NnkBk{p) 

m m 

Y, ^/N^k^nk{Bk{t)) -Y,^nkiBk{p) - Bkit)) 

k=l k=l 

m m 

J2 ^/N^kYnkiBkit)) - J2 NnkPk{t){p -t)+ p^{t,p) 
k=l k=l 

Y.NnkPk{t)(^-{p-t))+p^{t,p) 



fc=i 



fc=i 



and 



Lnit,p) = Yl \/N^kWkip)YnkiBkit)) -J2NnkWkip)iBkip) - Bkit)) 

fe=l fc=l 

m m 

= 5Z\/A^w^fc(t)V„fc(5fc(t))-^Ar„fcw;fc(t)/3fc(t)(p-t)+p^(t,p) 

k=l fe=l 

= f;iV„,^fc(t)/3.(t)(^-(j,-t)) +p^(t,p), 



k=l 



where 



\Pnit,p)\ < 
\pnit.P)\ < 



0{n)\p-t\'^ 
t{l-t) ' 

Op{y/^)t^{l-t)^\p-t\ 



t(l-t) 



0(n)|p-t| 

t2(l-t)2 
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Note that for t E [c„, 1 — c^], 

t(l-t) - t(l-t) - ^^^ "^^ '■ 

Hence we choose c„ such that c„ | but ncn — )■ oo. With this choice we may 

conclude that uniformly in t G [c„, 1 — c„], 

Kit,p^{t))\ < o,ii)t''-\i-tr-\ 

On the other hand, it follows from XlfcLi /^fc — "^ that 

m 

y^iVnfc/3fc(t) > min Nnk, 

•'— ' /c=l,...,n 

k=l 

m 

y^NnkWk{t)Pk{t) > min A^nfc ttt^^- 

■^ — ' /==!,. ..,n tl i — t) 

k=l 

Consequently, 

m 

nBn{t)-J2NnkBk{p^{t)) 

k=l 

k=l ^ 

= E N^kPkit) ^^V^ (-b + 0,{n-'/'ci-')4\t)) 



fc=i 
and 



fc=l ^ 

= J2^r.kWkit)Pkit) ^ ^' (-b + 0,in-'/'cy)K]iit)) 
fc=i ^ 

for some random functions k^, k}^ : [c„, 1 — c„] — )■ [—1, 1]. These considerations show 

that ([5]) and ([6]) are satisfied with [c„, 1 — c„] in place of (0, 1). 

It remains to verify ([5]) and Q with (0, c„] in place of (0, 1); the interval [1 — c„, 1) 
may be treated analogously. Note first that for 2 < /c < m, 

Bk{t) < B2{t) < m(m -l)tV2 and /3k{t) < mT^-H, 
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so 



\Bk{p) - Bk{t)\ 



(3k(u)du 



< 0{m.ax{p,t)){p — t). 



Futhermore, since -Bi(t) = 1 — (1 — t)™, 

Bi{p)-Bi{t) = m{p-t) + 0{max{t,p)){p-t). 
Hence for t E (0, c„] and p E (0, 2c„], 

m 

nBn{t)-^NnkBk{p) 



k=l 



Y, V^AUV„fc(5,,(t)) - Y, Nnk{Bk{p) - Bk{t)) 



k=l 



fc=l 



and 



-NnMp-t)+Pnit,P) 



L'nit,P) = J2 V^kWk{p)Ynk{Bk{t)) -J2^nkWk{p){Bk{p) - Bkit)) 



k=l 



k=l 



-NniWi{p)m{p -t)+ p^{t,p), 



where 



\Pn\t,P)\ < OpiVn)t' + OinCn)ip-t), 

\Pnit,P)\ < Op{y^)p-H' + OinC,,)p-\p-t). 



Note also that 



sup 

te(o,c„ 



nipiit)-t) 



-b 



^p 0. 



In particular, sup(g(Q_^^] Pni^) = Cn + Op{n^^^'^cf^) = c„(l + Op(l)), and in case of 6 > 0, 
P {pfXt) > for < t < c„) ^ 1. 

In case of 6 > 0, these considerations show that for < t < c„, 

m 
lBr,{t)-YNnkBk{pl\t)) 



nl 



k=l 



< 



-Nramipt\t)-t)+p^it,p'^it)) 



4-Sn +\S 



{-b + Op(l)) + Op{y^)t^ + 0{y^Cn)t 



Nnimt^n -ty , ^ , ,. 

< — '-{-b + o,{l)) 
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and 



< -j= {-b + Op{l)). 



Analogously, in case of 6 < 0, for any t E (0, c„] we obtain the inequalities 

k=i [o ifp:f(t)<o, 

/ .. ^L,.^^ ^ ) ^^7^-^ -[-b + Op{l)) ifplit) > 0, 



K{t,pl:{t)) > { v^ 

Hence (|5]) and (|6]) are satisfied with (0, c„] in place of (0, 1). D 

Proof of Theorem m It follows directly from Proposition |4] that V^ = V^ and V^ con- 
verge in distribution to 

^ m m 

VS:=V^ := ^5^Vfco5fc and V^ := v^J^TfeV^o^fc, 



m 

k=l k=l 

respectively, with independent Brownian bridge processes Vi, V2, . . . , ¥„. Consequently, 
since V^ is centered with continuous sample paths, V^ has these properties too. Moreover 
it follows from ]E{Y k{u)Y k{v)) = minJM, v} — uv for u,v E [0, 1] that the covariance 
function K^ is as stated, and 

^ m 

K''{s,t) = K'^{s,t) = -^(5,(min{s,t})-5,(s)5,(t)) 

k=l 

^ m 

= min{s,t} y^ Bkis)Bkit), 

k=l 

because m^^ J2T=i Bk{u) = m for < m < 1. 

It remains to prove the inequalities for K'^{t,t), < t < I. On the one hand, for 
Z = S,L, 



K' 



1 1 ™ 

\t,t) = t--y2Bk{tf = t(i - 1) - - V(5,,(t) - 1)2 < t(i-t), 



m ^ — ' m 

k=l k=l 



20 



because Bi{t) > B2{t) > . . . > Bm{t). 

On the other hand, the definition of 7^ and Wk implies that 

irL(t,t) = mJ2lk{tyBk{t){l-Bk{t)) 

k=l 

m 

= mJ2 Wkit^Bkim - Bkit))/(J2 Pk{t)wk{t)\ 

fc=i 
But ^^1 Pk{t)/m = 1, so it follows from Jensen's inequality that 

K^^t,t) < V^t^fc(t)-' = y2-Bk{t){l-Bk{t)) = K^{t,t)=K'^{t,t) 

k=l k=l 

with strict inequality unless wi(t) = W2{t) = ■ ■ ■ = Wm{t)- 

That the numbers Wi{t),W2{t), . . . ,Wm{t) are not identical, except for t = 1/2 and 
m = 2, can be verified as follows: Elementary algebra reveals that 

in Tfi 

^i(^) = -y. , ^,„-i,, TT and w^{t) 



Thus t = 1/2 is the only solution of u;i(t) = Wmit). Moreover, one can easily show that 
Wi(l/2) =w;2(l/2)if, andonlyif,2™+i = m'^ + m + 2. But h{m) := 2™+V(m2 + m + 2) 
is easily seen to satisfy h{l) = h{2) = 1 and /i(m + l)/h{m) > 1 for m > 2. Thus 
Wi(l/2) 7^ ^2(1/2) whenever m > 2. D 

Proof of Theorem |3l For symmetry reasons it suffices to prove the first part about the 
left tails. Let (c„)„ be a sequence of numbers in (0, 1/2] converging to zero. Then for 
t e (0,c„] andpG (0,1), 

m 

nBn{t)-Y,NnkBk{p) 
1 

m m 

J2 VN7k'Vnk{Bk{t)) - Y, Nnk{Bk{p) - Bu{t)) 



k=l 

m 



k=l k=l 

■M 



v'iV„iV„i(Bi(()) - n„Mp - () + p"(«,p) 
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and 



L'„(t,p) = J2 ^/N7kWk{p)Ynk{Bkit)) -J2^nkWk{p){Bk{p) - Bk{t)) 

k=l k=l 

= ^/K~lWlip)YnliBl{t)) - Nrawiip)m{p - t) + p^(t,p) 
= N^,mw,ip)(^^^^^^-ip-t))+pl:it,p), 



where 



\p\{t,p)\ < Op{V^)p-H^' + 0{n)p-^imix{t,p){p-t). 
Now we proceed similarly as in the proof of Theorem [T| defining 



n 



for some fixed b ^ 0. Note that for t E (0, c„], 

\Pn{t)-t\ < Op{n-'/Y + 0{n-'/')t'' = o,{n-'/')t', 
because k > S. Note also that 

t + ^ = i+^-'^» = ,_1-(1-')"VM) > o„(0.1). 



because fi„i > and t - (1 - (1 - t)"')/m > for < t < 1. Thus p„(t) > for all 
t e (0, c„] in case of 6 > 0. 

In case of 6 > we may conclude that 



nBn{t)-J2NnkBkiPnit)) 

k=l 

"'' Ml 



Nnim^ + pl\t,pn{t)) 



n 



< ^ (_6r + o,(l)t2^ + 0(l)(t + Op(n-V2)t^)t^) 



\/n ^ 
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and 



n 
Now we choose 5 = k/2 and conclude that for any fixed 6 > 0, 



P(v^(S^(t) - t) < Vf (^) + &^" fori e (0,c„]) ^ 0. 

Similarly we can show that for any fixed 6 < 0, with asymptotic probability one, 

n{Bl{t) - t) < \^n\t) + W for all t G (0, c„]. D 



6 Concluding remarks 

The methods in Sections |2] and [3] may be extended easily to a more general setting with 
independent observations {Xi, Ri,mi), 1 < i < n, where mj > 1 is a fixed integer, 
Ri e {1,2, ... , rrii} is a fixed or random rank, and 

W{Xi <x\Ri = k) = Sfc,„,+i_fc(F(x)). 

Here B^/ denotes the distribution function of the beta distribution with parameters k and 

i. 

Another possible extension involves imprecise ranking as considered by MacEachern 
et al. (2004). Suppose that Rj is a probability distribution on {1, 2, ... , m} rather than a 
single number, and that the conditional distribution function of Xj, given i?j, equals 

m 

]P{X,<x\Ri) = 5^i?,({A;})5,(F(x)). 
fc=i 

Then the equation for F^{x) may be rewritten as 

n m 

and the distribution function G]sf„^p{y) should be replaced with the distribution function 
GRn,p of X]"=i Zi^p, where Zi^p, Z2,p, . .. , Zn,p are independent with 

m 
k=l 
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